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Cn ' Abstract. The variance of the concentration in a sample can be estimated 

using knowledge of the particle masses, concentrations and the parameter for 
*Z-> '_ the dependent selection of particles. A number of variance estimators are 

constructed including a class of hybrid estimators. 
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1. Introduction 



Particulate materials are routinely sampled in industries that deal with solid 
materials and generally the concentration of a certain substance of interest in a 
sample is used as an estimate for the corresponding concentration in the population 
(or batch) from which the sample was taken. Knowledge of the occurrence of 
sampling errors, made during this estimation, and their potential magnitude is a 
prerequisite to being able to make reliable decisions based on samples. The terms 
"population" and "batch" are used as synonyms as the term "batch" is commonly 
used in the application area of particulate material sampling. Particulate materials 
are materials that consist of solid objects, called "particles". The size of these 
qv \ particles can range from the scale of a micrometer or smaller for powders to the 

£S| ■ scale of a centimeter or larger for coarse granular materials. 

Our framework can mathematically be formulated as follows: there is a large 
but finite population made of T kinds of particles. We will let m z - denote the mass 
of one particle of the ith kind and we will let c 2 ; denote the (mass) concentration 
of the substance of interest in a particle of kind i. Both rrii and Ci are assumed 
known, or can be accurately measured. The population is sampled (according to a 
sampling design), which implies here that a single sample is obtained which contains 
a number of particles. The number of particles of the ith kind in the sample are 
counted and recorded as N%. Note that the sample size J^- Ni is a random variable 
and not assumed to be fixed here. The total mass of the sample is M samp i e defined 
by: 

T 

M sample = / y NjlTli 
i=\ 

and the total mass of the substance of interest in this sample is A samp i e defined by: 

T 

-^-sample — / ^ifChiC% 
i=l 

The sample concentration (denoted by 6) is defined by as 9 — A samp i e /M samp i e 
and is an estimator for the corresponding population quantity denoted as 6, i.e. 
the ratio of the amount of property of interest in the batch A oatc h and the mass 

l 
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of the batch Mbatch- This article discusses how to estimate the variance of 9. A 
general approach to the problem of finding variance estimators for 9 is followed 
which attempts to construct variance estimators that are applicable for a wide 
range of possible sampling designs. 

In order to specify estimators for the variance, without specifying an explicit 
sampling design, we need some information concerning the first and second-order 
inclusion probabilities, which are assumed to be well-defined even though the sam- 
pling design remains unspecified here. For this purpose, it will be seen that a new 
parameter, the parameter for the dependent selection of particles (denoted by Cij 
and discussed in section[5]below), suffices. Like rrii, ci and Ni, it will be assumed Cy 
is known or can be accurately determined. Practical determination of the required 
parameters (Ni, rrii, Ci and Cy) is discussed in section [5] 

The variance of 9 is both influenced by variations in M samp i e and in A samp i e . 
Because M samp i e can potentially vary between the mass of the lightest particle in 
the population and the mass of the population and A samp i e can potentially vary 
between zero and the total amount of property of interest in the population, the 
variance of 9 remains finite. A variance estimate can therefore serve as a practically 
useful quantity to gain insight into the potential magnitude of the sampling error. 
In this article, a number of estimators for the variance of 9 will be constructed. For 
minimization of the bias of 9 the reader is referred to other literature (see e.g. [8]). 

2. Dependent selection of particles 

It will be seen that the variance of 9 depends on the particle masses (the values of 
rrii), the particle concentrations (the values of ci), and also on the covariance matrix 
of the variables N\,..., Nt via the parameter for the dependent selection of particles 
(Cij), defined by Eq. (JTJ) below. In that expression, the effect of dependent selection 
of particles on the covariance matrix of the variables Ni, ...,Nt is parameterized 
as: 

(1) E (NiNj) - E (Ni) E (Nj) = A l3 E (N t ) ~ QjE (N) E (Nj) 

Where E (.) denotes an expected value and A,j is the Kronecker delta which is one 
when i = j and zero otherwise. Note that Eq. (Q]) implies that each Ni does not 
have to have a marginal Poisson distribution, because the matrix with elements Cij 
may have non-zero diagonal elements. In addition, N and Nj may be correlated, 
because the off-diagonal elements of the matrix with elements Cy may also be non- 
zero. In other words, non-zero Cu are used to parameterize deviations from the 
marginal Poisson distribution, while non-zero values of Cij for i ^ j are used to 
parameterize deviations from zero correlation between Ni and Nj. 

It can be proven -under certain conditions (see|^)- that Cij can be interpreted 
as a correction for the dependent selection of particles in the limit of a sufficiently 
large population. Denoting the inclusion probability of a particle belonging to the 
zth and j th particle class respectively as Ki and Kj , and denoting the second-order 
inclusion probability of the pair consisting of a particle of type i and j as Kij , this 
can be expressed as: 

(2) Cijnl-^- 

In El a derivation is presented which shows that the '«' in the above equation 
can be replaced by '=' when the population (or batch) contains an infinite number 
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of particles of each type or kind. Apart from a sign, the above equation corre- 
sponds to the result given by [3]. From Eq. ([2|) follows that when Cij = the 
particle selections can be considered approximately independent (k^- sa KiKj). It 
also follows from Eq. ([2]) that if Cij is positive, Kij < KiKj , which can be caused 
by segregation of particles of type i and j. If Cij is negative, Kij > KiKj , which 
can be caused by grouping of particles of type i and j. Thus, the parameter Cij 
has a physical meaning, which makes the parameterization expressed in Eq. ([1]) 
practically significant. 

3. CONSTRUCTION OF VARIANCE ESTIMATORS 

3.1. Estimators based on a Taylor expansion. 

3.1.1. Linear terms (i). A Taylor-linearization (see e.g. chapter 5 of [1]) of 6 = 
Asampie/M sample with respect to deviations of A samp i e and M samp i e from their 
expected values is applied: 

Z -^sample . 



*-* {-^-sample) i 



E (M sample) E (M sample 

^ \-™-sample ) 
E (Msampie) 



1 



^sample 
{-^■sample *-* \** sample ) ) < 

\-utt sample *-> \^J- sample ) ) 



E (M sam pl e ) 

where == means "is equal to in first-order Taylor series expansion". From the above 
linearization it follows that the variance of 9 is approximated by: 

V(9) = 

V( A \ A- E ( A sample) T/ / H \ _ Q E(A samp i e ) (~i t A . 71 jt \ 

v y-t*- sample) ~r E 2 (M ) ^ sample) E(M \^Uv '\-t~i sample i 1V1 sample) 

E 2 (M sample) 

Where V(.) and Cov(.; .) represent the variance and covariance operator respec- 
tively. Expressing the expected values, variances and covariance in terms of the 
particle numbers Ni and Nj yields: 

v(e) = - 



E 2 (Msampie) 

T T 
\~^ \~^ ( E (Asampie) \ ( E(A sam ple 

}^ 1^ m ^ m J c * - -^ni — ) ( c i 



._ . _ | x E (Mgample) J \ E(M sa mple) 

(3) x[E(N i Nj)-E(N i )E(N j )] 

Substituting Eq. (TTJ into Eq. ((3]) results in: 



V ^ = wrw — t E E ( N > ! ' ! ' ' E(Asample) 



T T 



E 2 (Msampie) f^ " ' '\ E(M sample) J E 2 (M sa mpl 



% — i j — i 
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The above expression is used to derive an expression for a variance estimator by 
replacing E{Ni) by N and E(Nj) by Nj. Regrouping terms slightly yields: 

T T 

( 4 ) Vri0) = TT2 E E m '^' ( C * - *) ( C ^ ~ § ) [^ A » 

sample £_j j=\ 

The index Tl is used to denote the above estimator (the variance estimator based 
on a first-order Taylor expansion) , because of the first-order Taylor expansion. The 
estimator was also given by [3]. 

3.1.2. Second-order terms (ii). Although it is expected that the first-order Tay- 
lor series expansion will usually work well, it is interesting to construct a novel 
estimator based on the second-order Taylor expansion. The second-order Taylor 
expansion of 9 is: 

q ^-sample _^_ , i y-^-sample ) 



(5) 



^sample *-* y-^1 sample) 

sample Z.1V1 sample -^-sample^ sample sample 



^■^-sarn.nlp ^1** sam.nte ^-snrnrtli'^J-snrnnle 



i^X^-sample) H/ylvl sample) ^\^-sample)-^\^ sample) ^ [^sample) 



where = means "is equal to in second-order Taylor series expansion" and est rep- 
resents some terms that are constant and which therefore do not influence the 
variance of A sarnp i e /M sarnp i e . Calculating the variance using the above equation 
directly would imply evaluation of terms like E(A x sample M v sample ) with x + y = 3 
or x + y — 4, for which Eq. (|T|) is no help. To be able to use Eq. ([3]), further 
assumptions are therefore required. A statistical model is proposed and used here, 
which states that a new quantity, B samp i e , defined as: 

\P) ^sample ^-sample +V1 sample^ OVysisamplei ^sample)/ ' [^sample) 

is independent on M samp i e . This model leads to a zero covariance between B samp i e 
and M samp i e , while preserving the covariance between A samp i e and M samp i e . It 
is also assumed that M samp i e has the same skewness and kurtosis as a normal 
distribution (this assumption is discussed below). Using the here assumed statistical 
model, assumption and Eq. ([5]) results after a lengthy computation in: 



-t/-(n\ _2_ *\-t>sample) l * \1VI sample) 

[ ' ~ E 2 (M sample ) E* (M samp i e ) X 
E (B samp i e ) + V(B samp i e ) + 2(3 zV(M samp i e ) 



(7) 

where j3 is defined by: 
(8) fi = 1 - 



O OV {^sample 5 -^ sample) *-* \-LVl sample) 



V \^saraple)^- J \^-sample) 

The variable B samp i e can be eliminated in the Eq. ([7]) using Eq. ([5]). The resulting 
expression depends on the expected values, variances and covariances of A samp i e 
and M samp i e . In this expression, A samp \ e and M samp [ e can be written in terms of 
the variable N. Eq. ((TJ) can subsequently be used to eliminate the covariances 
between Nj and Nj on the right-hand side of the expression. The result is an 
expression that depends on rrii, a, E{Ni) and Cy. The expected values E(N) can 
then be replaced by their sample values Ni to obtain a variance estimator. This 
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estimator (not fully written out here because it is a long equation) is named the 
variance estimator based on a second-order Taylor expansion and is denoted by the 

symbol Vti ( 9 ) , where the index T2 refers to the second-order Taylor expansion. 
When the assumptions with respect to skewness and kurtosis are not met, the 
estimator Vti I 9 ) can have an increased bias. It is expected, however, that in many 
practical scenarios the sample mass is normally distributed - an assumption that 
can be tested. If the distribution of sample masses is non-normal, the estimator 
Vt2 [Of can possibly be adapted to take into account the skewness and excess 
kurtosis of the distribution of M samp i e . 

3.2. A variance estimator based on the Horvitz-Thompson estimator. 

The Horvitz-Thompson estimator for the variance of a 7r-expanded estimator can 
used to find an estimator for the variance of 9. A 7r-expanded estimator (see e.g. [4]) 
for the concentration in the batch can be obtained by considering the batch concen- 
tration (9 = Abatch/Mbatch) to be the population total of yi = m n ^c n ^/M batc h, 
where m x and c x denote the mass and concentration of a particle of type x (for 
x between one and the total number of classes), n(i) denotes the class of the zth 
particle in the batch and Mbatch is the total mass of the population (or batch) . The 
expression is: 

T 

(9) 9„ = ^ Nirma/iMbatchKi) 

where 9 n is the 7r-expanded estimator for the concentration in the batch, Ni is the 
number of particles in the sample belonging to the ?th particle class, rm and Ci 
are respectively the mass of and the concentration in a particle belonging to the 
ith class, Mbatch is the mass of the batch and Ki is defined here as the first-order 
inclusion probability of a particle belonging to the ith class. (In this article two 
symbols (k and 7r) are used to denote inclusion probabilities, which have a subtle 
difference in meaning of the index: for k the index refers to the kind of the particle, 
while for 7r the index refers to the particle number. Hence, by the definition adopted 
in this article: Ki = Kj when particle j of the population is of kind i. Implicit in 
the use of the variable Ki is the assumption that tti does not have variation between 
particles of the same kind.) 

A derivation of Eq. (O was given by [TJ . If the sample mass is constant and the 
first-order inclusion probability is equal to the ratio of the sample mass (M sam pie) 
and the batch mass (Mbatch), the 7r-expanded estimator becomes equal to the sam- 
ple concentration, 9. Under these assumptions, the following equation for the vari- 
ance of the sample concentration, based on the general Horvitz-Thompson estimator 
for the variance of the 7r-expanded estimator, can be derived: 
(10) 

TT /i 1 \ T /ii 



V HT {9) = 2^1^ NiN i 2 + >J N * — - - T7p- 

i=i j=i \ K i K i K ij J M batch . =1 \ku Ki J M batch 

in which Kij is the second-order inclusion probability of a particle pair in which 
the first-particle belongs to the «th class and the second to the jth class, (i.e. a 
similar relation between Kij and Wij exists as between Ki and ni ). Substitution of 
Eq. ^ (i.e. Kij = KiKj(l — C^)) for the second-order inclusion probability and 
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Ki = M sample /Mbatch, results in: 

i=1 a— i \ 1 °*J / 1V1 sample 

J- ivi sa7nv i e \ Vfl- C- 



(11) + £Ml 



i=1 V x ^" Mbatch J M samplf , 

Assuming that the batch (or population) from which the sample was drawn is 
much larger than the sample, i.e. M oa tch » M sa mpie so that 1/(1 — Cu) — 
M sample I thatch ~ 1/(1 — Cu), the above result can be rearranged to yield: 

(12) V HI (§)= 1 rV WiAij - C lj N i N j ] mjmjCjCj 

sample i—i j=l l 3 

The above estimator is named 'the variance estimator based on the Horvitz-Thompson 
estimator' and denoted by the symbol Vht (#), where the index 'HT' refers to 
Horvitz-Thompson. The estimator expressed in Eq. (fT2|) was also given by [3]. 

Using Eq. (p}, it can be proven that, as expected, the above estimator is unbiased 
when the sample mass M samp i e is a constant. In practice there will almost always 
be slight random variations in sample mass, leading to a potential bias in the 
estimator. It is expected, though, that reducing variations in sample mass (e.g. by 
using a sampling tool that results in samples of constant sample mass, will also 
reduce the possible bias caused by variations in sample mass. 

3.3. Adaptations of variance estimators. In Eq. (fT2|) . the factor [iVjAjj — 
CijNiNj] /(l — Cij) is noticeable. The question therefore rises whether the variance 
estimator expressed in (4) might be improved by replacing the factor [iVjAjj — 
CijNiNj] by [NiAij- CijNiNj] /(l - C y ), based on an analogy with V H t{0). This 
results in the following variance estimator: 

h*\v (ff\ 1 v- y- WiAjj - Cjj NjNj] man, (a - g) ( Cj - - g) 

(13) Vadx (0) = —2 2^ Z^ r^cT- 

lvl sample i=1 j =1 1 °*J 

Where the index AD1 refers to 'first adaptation'. Using Eq. ([1]), it can be proven 
that the expected value of the above estimator is equal to the right-hand side 
of Eq. (|3|) if statistical fluctuations in M sam pie and 9 are discarded. The above 
estimator can also be considered to be a derivation of the estimator based on the 
Horvitz-Thompson estimator, where Cj and Cj are replaced by (cj — 9) and (c-j — 9) 
respectively. 

A second adaptation is obtained by replacing the factors [NiAij — CijNiNj] by 

[NiA^ —C^NiNj}/ (1 — Cij) in the equation for Vt2 (0) ■ The resulting estimator 
is denoted by the symbol Vad2 { 9 



3.4. A variance estimator based on the Sen- Yates-Grundy variance esti- 
mator. Sen [S] and Yates and Grundy [B] derived the following estimator for the 
variance of the 7r-expanded estimator for the population total of yi : 

kesies, ijtk v ' 
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where ttu, tti and ttm denote respectively the inclusion probability of the fcth and 
Zth particle of the population and the second-order inclusion probability of a pair 
consisting of the fcth and Ith particle. For fixed size samples and positive second- 
order inclusion probabilities, the above estimator is unbiased. Note that the indices 
fc and I refer to a particle number in the population. 

Applying the equation to the 7r-expanded estimator for the batch concentration 
(see Eq. ((§)) and using m = «„(,) (Eq. (US}), 71^ = K n (i)n(j) (Eq. (HI])) and 
Kij = KiKj(l - Cij) (Eq. J2J) yields: 



VSYG0) = W £ 



/ m n{k) c n(k) m n(l)Cn(l) \ C n (k)n(l) 

2 ^jg^/fe \ M batchK n (k)) M batch K n (i)) J 1 - C„(fe) n (;) 



Substituting K„ (i) = M sample / M batch results in: 

Vsyq0) = f772 £ £ (m n(fc )C n(fe) -m nW c nW ) 2 — ^MfL_ 

ZJU S ampie k€S l€S, l^k ^n(k)n(l) 

The summations can be rewritten as summations over the particle classes: 

T T 

(14) Vsyg(6) = wrJ J2 J2 NiNjimid ~ rn jCj ) 2 —^- 

ZIVI sample j_j j =1 1 °*J 

The above estimator is named 'the estimator based on the Sen- Yates- Grundy es- 
timator' and is denoted by the symbol Vsyg{&), where the index SYG refers to 
' Sen- Yates- Grundy ' . 



4. Hybrid variance estimators 

Hybrid forms, which are combinations of the estimators constructed in section 
G2 can be constructed in order to combine the strengths of the estimators developed 
so far. Here a class of hybrid estimators is derived based on combining the variance 
estimator based on the first-order Taylor linearization, Vti{0), with the variance 
estimator based on the Horvitz-Thompson estimator, Vht{@)- As noted above, 
Vht(8) is unbiased when the variance of M samp i e is zero. On the other hand, 
the variance estimator Vti{0) is designed to take into account linear variations in 
M S ampie- A possible suitable hybrid form would therefore be: 



Vhyb (0) - aV T i0) + (1 - a)V HT 0) 



Where a would ideally be a monotonic increasing function of V(M samp i e ), with 
a = when V(M samp i e ) = and < a < 1 when V(M samp i e ) > 0. Combining the 
definition of M samp i e and Eq. (fTJ) results in: 

V(M sample ) = J2 E(Ni)ml -J2J2 C l] E{N l N J )m l m ] 

i i j 

An unbiased estimator for the variance of M samp i e is therefore: 
V(M samp i e ) = J2 N i m i ~ J2J2 CijNiNjrmmj 
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The Relative Standard Deviation (RSD) of M samp i e can therefore be estimated 
using: 

_____ \JV(M sample ) -r y-(M , \ > PI 

RSD(M sample ) = £ s -V,m ( ^ ^-amplej £ U 

if F(M sompie ) < 

A flexible choice for a therefore would be: 

= 1 _ e -RSD(M sampte )/x 

in which a numerical value can be assigned to x. The resulting variance estimator 

is: 

Vhybx(O) = aV T1 {9) + (1 - a)V HT (§) 
The index HYBX refers to hybrid-X, where X can indicate the value of x: e.g. two 
candidates are Vhyboi{9) and Vhybo50) for % = 0.01 and x = 0.05 respectively. 

5. Discussion of practical determination of input parameters 

5.1. Determination of M samp i e and 9. It is generally possible to determine the 
sample mass M samp i e accurately using weighing. The sample concentration 9 is 
generally determined using chemical, physical or biological testing in a laboratory 
as determining this value is generally part of sampling and estimation of the con- 
centration in the population. It can therefore be assumed that a numerical value 
for 9 is known. 

5.2. Determination of iVj. For small samples it might be feasible to classify par- 
ticles one by one by hand and to establish Ni for each class by counting. However, 
practical samples may contain thousands of small particles, which may also be diffi- 
cult to classify visually. Indirect or automated methods to determine JV, need to be 
applied in these cases. It might also be possible to evaluate the variance estimators 
without numerical value of Ni'. all estimators, except VsyGi depend on JV, only as 
the product Niirii which is equal to the total mass of material belonging to the ith 
particle class in the sample. If the materials belonging to the separate classes in 
the sample can be separated Niirii can be determined by weighing directly. 

5.3. Determination of rrn and Cj. Within a class, particle mass (m.;) and con- 
centration (ci) are constant, so in principle determining the mass of one particle by 
weighing and concentration by chemical, physical or biological testing would suffice 
to establish the particle properties of the entire class. However, it is recommended 
here to analyze more than one particle to assert m; and Ci are constant within 
a particle class. It may also be infeasible to analyze a single particle if it is too 
small. In some cases, there may be prior knowledge about the material types. In 
those cases, Ci can possibly be estimated using the known material properties. The 
particle mass can sometimes be determined using the product of particle volume 
and material density. 

5.4. Determination of Cij. It has been discussed by [2J that a negative value of 
Cij implies grouping/clustering of particles, while a positive value of Cy implies 
segregation of particles. Currently, research is being conducted to evaluate Cy using 
image analysis and/or a modeling approach of particle properties. First principles 
of such an image-based approach to determine the value of Cij have recently been 
established [7]. 
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6. Conclusion 

Six variance estimators for use in the application area of particulate material 
sampling were constructed. 
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Appendix A. The parameter for the dependent selection of particles 

If TTi and 7Tjj are respectively the first and second order inclusion probabilities of 
the ith and ith and jth particle of the population and n(i) is the class number of 
the jth particle and n(j) is the class number of the jth particle, then we consider 
the class of sampling designs for which: 

(15) K n{i) = m 

(16) K n {i)n{j) — ^ij 

Because each particle belonging to an arbitrary class i, has a probability k% of being 
included in the sample, the expected number of particles belonging to the ith class 
in the sample is equal to: 

(17) E(Ni) = KiN iM tch 

where Ni^atch is the number of particles belonging to the zth class in the popula- 
tion. From the above equation follows that Ki = E{Ni)/Ni t batch- Similarly to the 
above derivation, an expression for the second-order inclusion probability Kij can be 
derived. Note that only when the two particles i and j are selected independently 
Kij = ^ x Kj. In all other cases, a correction factor is required [lj. This is written 
as: 

(18) Kij = KiX Kj X (1 - C- ■) 
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where C[j is the 'parameter for the dependent selection of particles'. It will now be 
proven that C[j — Cij. A population with N^atch and Nj^atch particles belonging 
to the zth and jth class (i 7^ j) contains Nijatch x Nj^atch pairs of particles where 
the first particle is of type i and the second of type j. If i = j, there are Ni^atch x 
{Nj^atch — 1) such pairs. For the sample, the numbers of pairs follow a similar 
pattern: iV* x Nj pairs if i ^ j and Ni x (Nj — 1) if i = j. Because the expected 
number of pairs in the sample is equal to the number of pairs in the population 
multiplied by the probability of a pair of being included in the sample, the expected 
value of the number of pairs is written as: 

(19) E(N t x (Nj - Ay)) = N lMtch x (N jMtch - Ay) x Kij 

Where A^ is the Kronecker delta, a parameter whose value is one if i = j and zero 
otherwise. The above equations, combined with Eq. ([1]), can be used to obtain the 
following expression: 

(20) dj = C'^ + Ay(l - CiiVNijbatch 

In the limit of infinite numbers of particles within each class of particles in the 
population (or batch) CV,- = C[a. This proves Eq. ([2|) in the main text. 
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